Skip to content

Go: add experimental query for IDNA digit-fold IP-literal smuggling#21784

Open
astrogilda wants to merge 4 commits intogithub:mainfrom
astrogilda:experimental-go-idna-ip-literal-smuggle
Open

Go: add experimental query for IDNA digit-fold IP-literal smuggling#21784
astrogilda wants to merge 4 commits intogithub:mainfrom
astrogilda:experimental-go-idna-ip-literal-smuggle

Conversation

@astrogilda
Copy link
Copy Markdown

@astrogilda astrogilda commented May 4, 2026

Summary

  • Adds a new experimental Go query go/idna-ip-literal-smuggle under go/ql/src/experimental/CWE-918/ that detects the UTS-46 IDNA digit-fold IP-literal smuggling anti-pattern.
  • Stateful taint configuration (TaintTracking::GlobalWithState) with two flow states (TPreIdna, TPostIdna) and a barrier that requires a trailing-dot trim followed by net.ParseIP / netip.ParseAddr to clear post-IDNA flow.
  • Inline-expectation tests with 23 unique sink alerts on the positive fixture and 1 intended-positive on the adversarial-witness binding case; 0 alerts on the negative fixture.

Motivation

golang.org/x/net/idna applies UTS-46 NFKC mapping inside (*Profile).ToASCII and (*Profile).ToUnicode for the Lookup, Display, Registration, and MapForLookup-derived profiles. The mapping folds 100 distinct non-ASCII Unicode digit codepoints partitioned into seven Unicode-block ranges (Latin-1 superscripts, mathematical superscripts, mathematical subscripts, circled digits, fullwidth digits, the Mathematical Alphanumeric Symbols block covering bold, double-struck, sans-serif, sans-serif-bold, and monospace digit styles, and segmented digits) to their ASCII equivalents. Devanagari digits (U+0966..U+096F) are not in scope: empirically verified against golang.org/x/net/idna v0.53.0, they pass through Punycode rather than fold to ASCII. The library contains no IP-literal detection, so a caller that runs net.ParseIP only before idna.ToASCII will accept a smuggled IPv4 literal such as "0.¹.0.0" (which maps to "0.1.0.0") into a downstream network sink that assumed the value was a domain name.

Single-state taint tracking is structurally insufficient: a pre-IDNA net.ParseIP barrier must NOT block flow that transitions through the IDNA call. The query therefore uses two flow states. Sources start in TPreIdna. The IDNA mapping call is modeled as a state-transition step that flips TPreIdna -> TPostIdna. Sinks are flagged only in TPostIdna. The barrier is net.ParseIP / net.ParseCIDR / netip.ParseAddr / netip.ParsePrefix, but only when the value reaching the parser was produced by a trailing-dot trim (strings.TrimRight(_, "."), strings.TrimSuffix(_, "."), or the manual if strings.HasSuffix(out, ".") { out = out[:len(out)-1] } slice form) of the IDNA output. The trim is required because "0.¹.0.0." maps to "0.1.0.0.", which net.ParseIP rejects on its own yet is still an IP literal for routing purposes.

The detection-strategy design choice (CodeQL inter-procedural taint as the primary recall vehicle, with Semgrep OSS as a high-precision direct-call secondary sweep) is documented in the upstream rules repo at docs/research/v0.1-detection-strategy.md. The short version: CodeQL TaintTracking::GlobalWithState is inter-procedural by default and propagates through one-deep wrappers like idnaASCII without an explicit isAdditionalFlowStep. That matters because the only two production callsites of any UTS-46-mapping profile across golang/go, kubernetes/kubernetes, and prometheus/prometheus (a 660 MB sweep) are wrapped by such a helper. A blanket structural rule of the form "ToASCII without recheck in same block" was considered and rejected: an empirical sweep across 31 production callsites in 19 repositories shows the rule would fire on PSL walkers, registrar pipelines, and TLS-SNI normalization sites where the mapped value never reaches a network-routing decision driven by attacker input, and the false-positive density is too high to ship as a default.

Scope is IPv4 only. IPv6 colons are rejected by IDNA rune-validation before UTS-46 mapping runs, so no IPv6 smuggle path exists.

Verification

Compile and unit-test commands (run from a working tree of github/codeql, with the staged files in place):

codeql query compile go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.ql
codeql test run     go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/

Local results from CodeQL CLI 2.25.3:

  • query compile: clean (1/1 success, 17 s).
  • test run: positive fixture produces 23 unique sink alerts plus 1 intended-positive on the adversarial-witness binding case; negative fixture produces 0 alerts. Inline-expectations match.

The positive fixture exercises the canonical wrapper shape modeled on golang.org/x/net/http/httpproxy.canonicalAddr (a one-deep idnaASCII helper consumed by both net.JoinHostPort and net.Dial-class sinks). The query fires twice on this shape, once per sink. That is the empirical evidence that TaintTracking::GlobalWithState propagates through the wrapper with no additional-flow-step model required.

The negative fixture covers the safe pattern (post-IDNA strings.TrimRight / strings.TrimSuffix / manual HasSuffix slice followed by net.ParseIP or netip.ParseAddr), the Punycode profile (which has a nil mapping function and is correctly excluded from the idnaMappingCall predicate), and unrelated hostname use that never transits IDNA.

Disposition by the upstream Go security team

The Go security team reviewed the underlying digit-fold behavior and declined to treat it as a library bug on 2026-04-29: their position is that UTS-46 is correctly implemented and the recheck obligation belongs to the caller. This query implements the caller-side detection that follows from that disposition. It is not a substitute for a library fix; it is the alert that surfaces the contract on the call site where the recheck is missing.

Source artefacts

The query, library, qhelp, examples, and test fixtures originate from the upstream rules repository astrogilda/idna-ip-literal-smuggle-rules (codeql/ subdirectory). The upstream repo also carries the corpus-eval scripts, the v0.1.x detection-strategy synthesis under docs/research/, and complementary Semgrep and gopatch rules for the same anti-pattern.

CLA

I will sign the CLA on first PR comment per the registry workflow.

References

Cross-language analogs of the same caller-side IP-literal-validation gap that this query addresses for Go IDNA:

  • CVE-2021-29923: Go net.ParseIP accepted IPv4 octets with leading zeros, allowing SSRF allowlist bypass via octal interpretation downstream.
  • CVE-2024-12224: Caller-side IP-literal validation gap producing comparable allowlist bypass behavior.
  • CVE-2024-3651: idna (Python) UTS-46 / Punycode handling input that drove disproportionate parser cost; same library family, different failure mode but illustrates the recurring pattern of UTS-46 input requiring caller-side validation that the library does not provide.

Adds go/idna-ip-literal-smuggle under go/ql/src/experimental/CWE-918/.

The query detects callers that pass an untrusted hostname through
golang.org/x/net/idna (UTS-46 NFKC mapping folds 100 non-ASCII Unicode
digit codepoints to ASCII) into a network sink without rechecking the
post-IDNA value as an IP literal after a trailing-dot trim. Without
the recheck, an input such as "0.[U+00B9].0.0" maps to "0.1.0.0" and
slips past any pre-IDNA net.ParseIP guard, smuggling an IPv4 literal
into net.JoinHostPort, net.Dial, http.Request.URL.Host,
tls.Config.ServerName, http.Cookie.Domain, and the DNS resolver
entry points.

Implementation uses TaintTracking::GlobalWithState with two flow
states (TPreIdna, TPostIdna). The IDNA mapping call is a state
transition step. The barrier is a trailing-dot trim followed by
net.ParseIP / net.ParseCIDR / netip.ParseAddr / netip.ParsePrefix in
TPostIdna; a bare ParseIP without the prior trim does not sanitize
because "0.1.0.0." is rejected by ParseIP yet remains a valid IP
literal for routing. IPv6 is out of scope because the colon is a
UTS-46 disallowed rune.

Tests: 23 unique sink alerts on the positive fixture (which includes
the canonical canonicalAddr-shape wrapper), no false positives on the
negative fixture (safe pattern, Punycode profile, non-IDNA hostname
use). codeql test run reports 1 of 1 PASSED.
@astrogilda astrogilda requested a review from a team as a code owner May 4, 2026 06:43
Copilot AI review requested due to automatic review settings May 4, 2026 06:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new experimental Go security query for detecting IDNA UTS-46 digit-fold cases where untrusted hostnames become IPv4 literals after IDNA normalization and then flow into network-relevant sinks without a post-IDNA IP recheck.

Changes:

  • Adds a new stateful taint-tracking library/query under go/ql/src/experimental/CWE-918/ plus qhelp and example code.
  • Adds positive and negative inline-expectation fixtures covering sinks, barriers, and a witness-binding regression case.
  • Adds a dedicated Go test module and expected output for the new experimental query.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/positives.go Positive inline-expectation cases for vulnerable IDNA-to-sink flows.
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/negatives.go Negative and regression fixtures for safe post-IDNA rechecks and non-sink cases.
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/go.mod Standalone module/dependency setup for the new Go test fixture.
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/IdnaIpLiteralSmuggle.qlref Connects the test directory to the experimental query and inline postprocessors.
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/IdnaIpLiteralSmuggle.expected Generated expected output baseline for the inline-expectation test run.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggleGood.go Safe example used by the query help documentation.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggleBad.go Vulnerable example used by the query help documentation.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qll Core stateful taint-tracking model for IDNA transitions, barriers, and sinks.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.ql Query entry point and result metadata for the new path-problem query.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qhelp End-user documentation, remediation guidance, and examples for the query.

Comment on lines +106 to +118
// Compliant: post-IDNA TrimSuffix + netip.ParseAddr recheck before (*Resolver).LookupIPAddr.
func compliantResolverLookupIPAddr(req *http.Request) {
host := req.Header.Get("X-HOST-IPADDR-OK")
ace, err := idna.Lookup.ToASCII(host)
if err != nil {
return
}
candidate := strings.TrimSuffix(ace, ".")
if _, parseErr := netip.ParseAddr(candidate); parseErr == nil {
return
}
r := &net.Resolver{}
r.LookupIPAddr(context.Background(), ace) // OK: post-IDNA recheck barrier
Comment on lines +163 to +181
// --- Class 8: Devanagari digits (U+0966..U+096F) ---
// UTS-46 NFKC folds Devanagari digits to ASCII equivalents.
// "१.0.0.0" (U+0967 DEVANAGARI ONE) -> "1.0.0.0"
func smuggleDevanagariDigit(req *http.Request) {
host := req.Header.Get("X-HOST-DEVANAGARI") // $ Source
ace, _ := idna.Lookup.ToASCII(host)
http.Get("https://" + ace + "/") // $ Alert
}

// --- Class 8 second positive: Devanagari digit U+0969 -> "3",
// (*net.Resolver).LookupIPAddr sink ---
// "१.0.0.३" -> "1.0.0.3"
func smuggleDevanagariResolverLookupIPAddr(req *http.Request) {
host := req.Header.Get("X-HOST-DEVANAGARI-TWO") // $ Source
ace, _ := idna.Lookup.ToASCII(host)
r := &net.Resolver{}
r.LookupIPAddr(context.Background(), ace) // $ Alert
}

Comment on lines +31 to +32
* `(*idna.Profile).ToASCII` (and the package-level `idna.ToASCII`,
* `Lookup.ToASCII`, `MapForLookup().ToASCII`) is modeled as a
ace, _ := idna.Punycode.ToASCII(host)
http.Get("https://" + ace + "/") // OK: no digit-fold profile
}

Comment on lines +72 to +73
Vulnerable pattern. <code>net.ParseIP</code> is called only before
<code>idna.ToASCII</code>, so the smuggled literal slips through:
Comment on lines +11 to +22
that folds <strong>100 distinct non-ASCII Unicode digit codepoints</strong>
across 8 families to their ASCII equivalents. The 8 families are:
</p>
<ul>
<li>Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints</li>
<li>Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints</li>
<li>Mathematical subscripts (U+2080..U+2089): 10 codepoints</li>
<li>Circled digits (U+2460..U+2468, U+24EA): 10 codepoints</li>
<li>Fullwidth digits (U+FF10..U+FF19): 10 codepoints</li>
<li>Mathematical bold, sans-serif, double-struck, and monospace digits
(U+1D7CE..U+1D7FF): 50 codepoints</li>
<li>Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints</li>
Comment on lines +61 to +65
single trailing dot and call <code>net.ParseIP</code> (or
<code>netip.ParseAddr</code>) on the result, then reject on non-nil. The
trailing-dot trim is required because <code>"0.¹.0.0."</code> maps to
<code>"0.1.0.0."</code>, which <code>net.ParseIP</code> rejects on its
own yet is still an IP literal for routing purposes.
Comment on lines +86 to +97
The safe pattern accepts three equivalent trailing-dot trim forms:
</p>
<ul>
<li><code>strings.TrimRight(ace, ".")</code>: multi-dot form. Handles
the fullwidth and ideographic dot variants that produce multiple
trailing ASCII dots after UTS-46 mapping.</li>
<li><code>strings.TrimSuffix(ace, ".")</code>: single-dot form.
Sufficient for most inputs but incomplete for the multi-dot
variant.</li>
<li><code>if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }</code>:
manual slice form. Equivalent to <code>TrimSuffix</code> in
effect.</li>
Comment on lines +51 to +56
// True-negative: caller uses idna.Display for human rendering only; the
// output never reaches a network sink in this function.
func displayOnly(req *http.Request) {
host := req.Header.Get("X-HOST-DISPLAY")
disp, _ := idna.Display.ToUnicode(host)
_ = disp // OK: never reaches a sink
Comment on lines +3 to +10
* @description An untrusted hostname flows through `golang.org/x/net/idna`
* mapping (which folds 100 non-ASCII Unicode digit codepoints to
* ASCII via UTS-46 NFKC) and reaches a security-relevant
* hostname sink without a post-IDNA IP-literal recheck. A
* caller that calls `net.ParseIP` only BEFORE `idna.ToASCII`
* will accept a smuggled IPv4 literal such as `"0.¹.0.0"`
* (which maps to `"0.1.0.0"`). Scope is IPv4 only because
* IPv6 colons are rejected by IDNA rune-validation before
astrogilda added 2 commits May 4, 2026 11:08
Fix several drift issues between the prose and the predicates:

- qhelp now lists the seven Unicode-block ranges that account for the
  100 fold codepoints, replacing the prior "8 families" claim that did
  not match the bullet list. Adds an explicit note that Devanagari
  digits do not fold (verified empirically against
  golang.org/x/net/idna v0.53.0) and that Registration disallows every
  fold codepoint at rune validation.
- qhelp recommendation corrects netip.ParseAddr semantics (parsed via
  err == nil, not via a non-nil return) and rewrites the trim-form
  list to state that TrimSuffix and the manual slice are not
  equivalent to TrimRight for multi-trailing-dot inputs.
- qhelp example caption matches the actual sample, which shows the
  no-recheck shape, not a pre-IDNA ParseIP shape.
- qll module docstring now covers both ToASCII and ToUnicode on the
  digit-folding profiles, lists the actual idna.New(MapForLookup)
  construction shape, and explicitly states that the package-level
  idna.ToASCII helper and the Punycode profile are excluded.
- ql @description and select message reflect that the model covers
  both ToASCII and ToUnicode and that the no-recheck case is the
  primary anti-pattern, not just the pre-IDNA-ParseIP case.
Three fixture changes plus a baseline refresh:

- Drop the two Devanagari positives. Empirical testing against
  golang.org/x/net/idna v0.53.0 confirms that U+0966..U+096F do not
  fold to ASCII via UTS-46; they pass through Punycode (xn--*) on all
  four profiles. Keeping the cases in would be misleading because the
  query fires structurally without the runtime smuggle ever existing.
- Add two positives covering Profile.ToUnicode on Latin-1 and Math
  superscript inputs. The library runs validateAndMap before the
  encode-vs-decode branch, so ToUnicode produces the same digit-folded
  ASCII output as ToASCII for the in-scope codepoints. The earlier
  fixture only exercised ToASCII despite the model handling both.
- Add three negatives. Two pin the ParseCIDR and ParsePrefix branches
  of the recheck-input predicate, which had no sink-reaching coverage
  before. The third pins the documented exclusion of the package-level
  idna.ToASCII helper against future broadening of the call matcher.

Baseline refreshed via codeql test run --learn after the fixture
changes shifted line numbers and the new select message text replaced
the old one.
@astrogilda
Copy link
Copy Markdown
Author

Pushed two follow-ups for the review pass.

docs: align qhelp/qll/ql with implemented model (6392c52)

  • qhelp recommendation: corrected netip.ParseAddr semantics (parsed via err == nil, not via a non-nil return).
  • qhelp example caption: matches the actual IdnaIpLiteralSmuggleBad.go sample, which shows the no-recheck shape, not a pre-IDNA ParseIP shape.
  • qhelp trim-form list: states explicitly that TrimSuffix and the manual slice form are not equivalent to TrimRight for multi-trailing-dot inputs.
  • qhelp family count: replaced "8 families" with the seven Unicode-block ranges that account for the 100 fold codepoints. Added a note that Devanagari digits are out of scope, verified empirically against golang.org/x/net/idna v0.53.0.
  • qll header: now covers both ToASCII and ToUnicode on the digit-folding profiles, lists the actual idna.New(idna.MapForLookup(), ...) construction shape, and explicitly states that the package-level idna.ToASCII helper is excluded.
  • ql @description and select message: cover both ToASCII and ToUnicode, frame the no-recheck case as the primary anti-pattern.

test: tighten fixture coverage and refresh baseline (771b433)

  • Dropped the two Devanagari positives. Empirical probe against golang.org/x/net/idna v0.53.0: U+0966..U+096F do not fold under any of the four UTS-46 profiles; they pass through Punycode (xn--*). Keeping the cases would have been misleading because the query fired structurally without the runtime smuggle ever existing.
  • Added two ToUnicode positives (Latin-1 superscript on Lookup, Math superscript on Display). The library runs validateAndMap before the encode-vs-decode branch, so ToUnicode produces the same digit-folded ASCII output as ToASCII for in-scope codepoints.
  • Added safe-pattern negatives for net.ParseCIDR and netip.ParsePrefix. Both were modeled in the recheck-input predicate but had no sink-reaching coverage before.
  • Added a negative for the package-level idna.ToASCII helper to pin the documented exclusion against future broadening of the call matcher.

Baseline refreshed via codeql test run --learn after the fixture changes shifted line numbers and the new select message text replaced the old one.

One precision gap I noticed during the round but did not pull into this commit, for transparency: the idnaMappingCall Punycode exclusion is syntactic and would miss aliased forms like pp := idna.Punycode; pp.ToASCII(...). A localFlow-from-idna.Punycode form would catch them. Marginal in practice. I can fold it into a third commit on this branch if you prefer.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

QHelp previews:

go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qhelp

IDNA digit-fold IP-literal smuggling via UTS-46 NFKC mapping

The Go module golang.org/x/net/idna implements UTS-46 IDNA processing. On the Lookup and Display profiles (and any profile constructed via idna.New(idna.MapForLookup(), ...)), both (*Profile).ToASCII and (*Profile).ToUnicode apply an NFKC-based character map that folds 100 distinct non-ASCII Unicode digit codepoints to their ASCII equivalents. The 100 codepoints partition into the following Unicode-block ranges:

  • Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints
  • Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints
  • Mathematical subscripts (U+2080..U+2089): 10 codepoints
  • Circled digits (U+2460..U+2468, U+24EA): 10 codepoints
  • Fullwidth digits (U+FF10..U+FF19): 10 codepoints
  • Mathematical Alphanumeric Symbols digits, spanning bold, double-struck, sans-serif, sans-serif-bold, and monospace styles (U+1D7CE..U+1D7FF): 50 codepoints
  • Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints
    Devanagari digits (U+0966..U+096F) are not in scope: empirical testing against golang.org/x/net/idna v0.53.0 confirms they do not fold to ASCII via UTS-46. The Registration profile is structurally covered by the rule but disallows every fold codepoint at the rune-validation stage, so a caller that respects the returned error never sees a smuggled literal from that profile in practice.

The library contains no IP-literal detection. A caller that applies UTS-46 mapping to an attacker-controlled host string and consumes the result in a network sink without rechecking against IP-literal parsers receives a valid ASCII IPv4 literal back as the "domain name" output. Any downstream allowlist check, SSRF guard, NoProxy match, or TLS-SNI router that does not re-check the post-IDNA result is bypassed. The anti-pattern also applies to callers that do a pre-IDNA net.ParseIP check and think it is sufficient: the smuggled host is not ASCII, so the pre-IDNA check rejects it as non-IP, and the post-IDNA value (now a numeric literal) reaches the sink unguarded.

IPv6 is out of scope: : is a UTS-46 disallowed character; bare-IPv6 inputs are rejected by IDNA rune-validation before any digit-fold mapping runs.

Sinks where the smuggled literal becomes exploitable include net.JoinHostPort, net.Dial, (*http.Request).URL.Host, (*tls.Config).ServerName, (*http.Cookie).Domain, and any HTTP client request URL constructed from the mapped value.

Recommendation

Either:

  1. Use a strict IDNA profile option that returns an error if the mapped output parses as an IP literal, if your IDNA library exposes one.
  2. Apply the explicit safe pattern: after the IDNA mapping call, strip trailing dots from the result and parse it. Reject if net.ParseIP returns a non-nil address, or if netip.ParseAddr returns no error (note the inverted convention: netip.ParseAddr reports a successfully parsed address via err == nil, not via a non-zero return). The trailing-dot strip is required because "0.¹.0.0." maps to "0.1.0.0.", which a bare net.ParseIP rejects on its own yet is still an IP literal for routing purposes; the strip exposes the literal so the parser sees it.

Example

Vulnerable pattern. The host string is mapped through the IDNA profile and reaches a network sink with no post-IDNA IP-literal recheck:

package main

import (
	"net"
	"net/http"
	"net/url"

	"golang.org/x/net/idna"
)

// VulnerableLookup mirrors the shape of the anti-pattern as it appears in
// real Go code: an attacker-controlled host string is canonicalised through
// idna.Lookup.ToASCII, the result is consumed by a network sink, and there
// is no post-IDNA recheck against IP-literal parsers. UTS-46 NFKC mapping
// folds 100 non-ASCII digit codepoints (e.g. fullwidth, mathematical
// superscripts, circled, segmented) to their ASCII equivalents, so an input
// like "0.¹.0.0" emerges from ToASCII as "0.1.0.0" and reaches the sink
// as a routable IPv4 literal.
func VulnerableLookup(host string) (*http.Response, error) {
	ace, err := idna.Lookup.ToASCII(host)
	if err != nil {
		return nil, err
	}
	return http.Get("https://" + ace + "/")
}

// VulnerableProxyRoute mirrors the canonicalAddr shape used in callers that
// canonicalise a URL host before applying network policy. The host is read
// from an attacker-controlled URL, mapped through an idna.Lookup.ToASCII
// wrapper, and passed to net.JoinHostPort without an IP-literal recheck.
func VulnerableProxyRoute(rawURL string) (string, error) {
	u, err := url.Parse(rawURL)
	if err != nil {
		return "", err
	}
	addr := u.Hostname()
	if v, err := idna.Lookup.ToASCII(addr); err == nil {
		addr = v
	}
	return net.JoinHostPort(addr, "443"), nil
}

Safe pattern. Post-IDNA trailing-dot strip followed by net.ParseIP recheck:

package main

import (
	"net"
	"net/http"
	"strings"

	"golang.org/x/net/idna"
)

// SafeLookup applies the safe pattern: a post-IDNA trailing-dot trim
// followed by net.ParseIP. The trim is required because "0.¹.0.0." maps
// to "0.1.0.0." which net.ParseIP rejects on its own yet is still
// routable as 0.1.0.0 in the rest of the stack.
func SafeLookup(host string) (*http.Response, error) {
	ace, err := idna.Lookup.ToASCII(host)
	if err != nil {
		return nil, err
	}

	// Post-IDNA trailing-dot trim, then re-check. TrimRight (not
	// TrimSuffix) handles multiple trailing dots that UTS-46 mapping can
	// produce when fullwidth/ideographic dots compose with ASCII dots.
	candidate := strings.TrimRight(ace, ".")
	if ip := net.ParseIP(candidate); ip != nil {
		return nil, errBadHost
	}

	return http.Get("https://" + ace + "/")
}

var errBadHost = errIPLiteral{}

type errIPLiteral struct{}

func (errIPLiteral) Error() string { return "ip literals not allowed" }

The safe pattern accepts three trailing-dot strip forms. They are not equivalent in coverage:

  • strings.TrimRight(ace, "."): strict form. Strips all trailing dots, so the multi-dot residue produced when UTS-46 maps the fullwidth dot U+FF0E or the ideographic dot U+3002 next to ASCII dots is fully removed.
  • strings.TrimSuffix(ace, "."): lenient form. Strips only one trailing dot. Sufficient for the canonical "0.1.0.0." shape but leaves residue if multiple trailing dots were produced by mapping.
  • if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }: manual single-dot slice. Behaves identically to TrimSuffix in coverage and inherits the same multi-dot-residue limitation.
    Callers whose threat model includes the multi-trailing-dot variant should prefer strings.TrimRight. After the strip, parse with netip.ParseAddr (preferred) or net.ParseIP and reject if the value parses as an IP literal (err == nil for the former, non-nil return for the latter).

References

Comment on lines +350 to +353
/**
* The IDNA mapping is modeled as a state-transition step:
* `TPreIdna(arg) -> TPostIdna(result)`
*/
Comment on lines +362 to +370
/**
* A correct post-IDNA IP-literal recheck (trailing-dot trim FOLLOWED
* BY `net.ParseIP` or equivalent) is a barrier in `TPostIdna`. The
* trim source is bound to the post-IDNA-tainted predecessor so that
* an unrelated TrimRight + ParseIP construct elsewhere in the same
* scope does not silently sanitize the IDNA-tainted path. A bare
* `net.ParseIP` without the prior trim is NOT a barrier; the alert
* remains.
*/
*/
predicate isBarrier(DataFlow::Node node, FlowState state) {
state = TPostIdna() and
exists(DataFlow::Node postIdnaResult, DataFlow::Node parseInput |
Collapses a single double-space before an inline comment that gofmt
normalizes to one space. No semantic change. Fixes the
make check-formatting CI step that flagged the file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants